Feature Reduction and Nearest Neighbours
نویسندگان
چکیده
Feature reduction is a major preprocessing step in the analysis of highdimensional data, particularly from biomolecular high-throughput technologies. Reduction techniques are expected to preserve the relevant characteristics of the data, such as neighbourhood relations. We investigate the neighbourhood preservation properties of feature reduction empirically and theoretically. Our results indicate that nearest and farthest neighbours are more reliably preserved than other neighbours in a reduced feature set.
منابع مشابه
Pseudo-Likelihood Inference Underestimates Model Uncertainty: Evidence from Bayesian Nearest Neighbours
When using the K-nearest neighbours (KNN) method, one often ignores the uncertainty in the choice of K. To account for such uncertainty, Bayesian KNN (BKNN) has been proposed and studied (Holmes and Adams 2002 Cucala et al. 2009). We present some evidence to show that the pseudo-likelihood approach for BKNN, even after being corrected by Cucala et al. (2009), still significantly underest...
متن کاملEnsembles of Nearest Neighbours for Cancer Classification Using Gene Expression Data
It is known that an ensemble of classifiers can outperform a single best classifier if classifiers in the ensemble are sufficiently diverse (i.e., their errors are as much uncorrelated as possible) and accurate. We study ensembles of nearest neighbours for cancer classification based on gene expression data. Such ensembles have been rarely used, because the traditional ensemble methods such as ...
متن کاملMinimax rates for cost-sensitive learning on manifolds with approximate nearest neighbours
We study the approximate nearest neighbour method for cost-sensitive classification on low-dimensional manifolds embedded within a high-dimensional feature space. We determine the minimax learning rates for distributions on a smooth manifold, in a cost-sensitive setting. This generalises a classic result of Audibert and Tsybakov. Building upon recent work of Chaudhuri and Dasgupta we prove that...
متن کاملEvolutionary feature weighting to improve the performance of multi-label lazy algorithms
In the last decade several modern applications where the examples belong to more than one label at a time have attracted the attention of research into machine learning. Several derivatives of the k-nearest neighbours classifier to deal with multi-label data have been proposed. A k-nearest neighbours classifier has a high dependency with respect to the definition of a distance function, which i...
متن کاملImage representation and processing through multiscale local jet features
We propose a unified framework for representing and processing images using a feature space related to local similarity. We choose the multiscale and versatile local jet feature space to represent the visual data. This feature space may be reduced by vector quantisation and/or be represented by data structures enabling efficient nearest neighbours search (e.g. kd-trees). We show the interest of...
متن کامل